Self-contained Entity Discovery from Captioned Videos

نویسندگان

چکیده

This article introduces the task of visual named entity discovery in videos without need for task-specific supervision or external knowledge sources. Assigning specific names to entities (e.g., faces, scenes, objects) video frames is a long-standing challenge. Commonly, this problem addressed as supervised learning objective by manually annotating with labels. To bypass annotation burden setup, several works have investigated utilizing sources such movie databases. While effective, approaches do not work when are provided and can only be applied movies TV series. In work, we take step further propose discover from corresponding captions subtitles. We introduce three-stage method where (i) create bipartite entity-name graphs frame–caption pairs, (ii) find agreements, (iii) refine assignment through entity-level prototype construction. tackle new problem, outline two benchmarks, SC-Friends SC-BBT , based on Friends Big Bang Theory Experiments benchmarks demonstrate ability our approach which belongs face scene, an accuracy close oracle, just multimodal information present videos. Additionally, qualitative examples show potential challenges self-contained any future work. The code data available GitHub. 1

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Activity Retrieval in Closed Captioned Videos

متن کامل

Indexed Captioned Searchable Videos: A Learning Companion for STEM Coursework

Videos of classroom lectures have proven to be a popular and versatile learning resource. A key shortcoming of the lecture video format is accessing the content of interest hidden in a video. This work meets this challenge with an advanced video framework featuring topical indexing, search, and captioning (ICS videos). Standard optical character recognition (OCR) technology was enhanced with im...

متن کامل

Eye movements while viewing narrated, captioned, and silent videos.

Videos are often accompanied by narration delivered either by an audio stream or by captions, yet little is known about saccadic patterns while viewing narrated video displays. Eye movements were recorded while viewing video clips with (a) audio narration, (b) captions, (c) no narration, or (d) concurrent captions and audio. A surprisingly large proportion of time (>40%) was spent reading capti...

متن کامل

Recognizing Situation Patterns from Self-Contained Stories*

We propose extracting information about characters and actions from a self-contained story, such as news reports. This information is stored in structure patterns called situations. We show how these situation patterns can be constructed by unifying the constituents of sentence analysis with knowledge previously stored in Typed Feature Structures. These situations can be in turn used subsequent...

متن کامل

Self-contained CLI Assemblies

High-level programming languages and bytecode-based virtual execution environments have become popular in software development. Bytecode-based runtimes extend embedded system by techniques to improve safety, help portability and interoperability. The ECMA/ISO Common Language Infrastructure (CLI) specifies a bytecodebased execution environment (Common Language Runtime) and a comprehensive class ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Transactions on Multimedia Computing, Communications, and Applications

سال: 2023

ISSN: ['1551-6857', '1551-6865']

DOI: https://doi.org/10.1145/3583138